Pattern Recognition Applied To The Acquisition Of A Grammatical Classification System From Unrestricted English Text

نویسندگان

  • Eric Atwell
  • Nicos Frixou Drakos
چکیده

Within computational linguistics, the use of statistical pattern matching is generally restricted to speech processing. We have attempted to apply statistical techniques to discover a grammatical classification system from a Corpus of 'raw' English text. A discovery procedure is simpler for a simpler language model; we assume a first-order Markov model, which (surprisingly) is shown elsewhere to be sufficient for practical applications. The extraction of the parameters of a standard Markov model is theoretically straightforward; however, the huge size of the standard model for a Natural Language renders it incomputahle in reasonable time. We have explored various constrained models to reduce computation, which have yielded results of varying success. Pattern recognition and NLP In the area of language-related computational research, there is a perceived dichotomy between, on the one hand, "Natural Language" research dealing principally with syntactic and other analysis of typed text, and on the other hand, "Speech Processing" research dealing with synthesis, recognition, and understanding of speech signals. This distinction is nut based merely on a difference of input and/or output media, but seems also to correlate to noticeable differences in assumptions and techniques used in research. One example is in the use of statistical pattern recognition techniques: these are used in a wide variety of computerbased research areas, and many speech researchers take it for granted that such methods are part of their stock in trade. In contrast, statistical pattern recognition is hardly ever even considered as a technique to be used in "Natural Language" text analysis. One reason for this is that speech researchers deal with "real", "unrestricted" data (speech samples), whereas much NLP research deals with highly restricted language data, such as examples intuited by theoreticians, or simplified English as allowed by a dialogue system, sach as a Natural Language Database Query system. Chomsky (57) did much to discredit the use of representative text samples or Corpora in syntactic research; he dismissed both statistics and semantics as being of no use to syntacticians: "Despite the undeniable interest and importance of semantic and statistical studies of language, they appear to have no direct relevance to the problem of determining or characterizing the set of grammatical utterances" (Chomsky 57 p.17). Subsequent research in Computational Linguistics has shown that Semantics is far more relevant and important than Chomsky gave credit for. Phenomenal advances in computer power and capabilities mean that we can now try statistical pattern recognition techniques which would have been incomputable in Chomsky's early days. Therefore, we felt that the case for Corpus-based statistical Pattern Recognition techniques should be reopened. Specifically, we have investigated the possibility of using Pattern Recognition techniques for the acquisition of a grammatical classification system from Unrestricted English text.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pati'ern Recognition Applied to the Acquisition of a Grammatical Classification System from Unrestricted English Text

Within computational linguistics, the use of statistical pattern matching is generally restricted to speech processing. We have attempted to apply statistical techniques to discover a grammatical classification system from a Corpus of 'raw' English text. A discovery procedure is simpler for a simpler language model; we assume a first-order Markov model, which (surprisingly) is shown elsewhere t...

متن کامل

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...

متن کامل

Level of Grammatical Proficiency and Acquisition of Functional Projections: The case of Iranian learners of English language

Unlike Lexical Projections, Functional Projections (Extended Projections) are more of an ‘abstract’ in nature. Therefore, Functional Projections seem to be acquired later than Lexical Projections by the L2 learners. The present study investigates Iranian L2 learners’ acquisition of English Extended Projections taking into account their level of grammatical proficiency. Specifically, the aim is ...

متن کامل

Reassembling Formal Features in Articles by L1 Persian Learners of L2 English

There  has  been  considerable  debate  over  what  the  sources  of  morphological  variation  in  second  language acquisition  are.  From  among  various  hypotheses  put  forth  on  the  topic,  the  feature  reassembly  hypothesis (Lardiere, 2005) assumes that it is the reconfiguration of features in the L2 which causes variation between the performance of natives and non-natives. Acknowle...

متن کامل

A Comparative Study of Nominalization in an English Applied Linguistics Textbook and its Persian Translation

Among the linguistic resources for creating grammatical metaphor, nominalization rewords   processes and properties metaphorically as nouns within the experiential metafunction of language. Following Halliday's (1998a) classification of grammatical metaphor, the current study investigated nominalization exploited in an English applied linguistics textbook and its corresponding Persian translati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1987